G3: Genes, Genomes, Genetics
◐ Oxford University Press (OUP)
Preprints posted in the last 90 days, ranked by how well they match G3: Genes, Genomes, Genetics's content profile, based on 222 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.
Bush, Z. D.; Naftaly, A. F.; Dinwiddie, D.; Hillers, K. J.; Libuda, D. E.
Show abstract
Laboratory cultivation subjects model organisms to selective pressure and genetic drift that can result in the accumulation of many genomic and phenotypic differences over time. The nematode Caenorhabditis elegans has been used for research since the 1970s, and studies comparing the N2 Bristol and CB4856 Hawaiian isolates provided foundational knowledge about metazoan genome evolution. Most comparative genomics studies have used these isolates because their long-term geographical isolation promoted a high degree of genomic divergence within the species. Further, there is growing evidence of phenotypic differences between laboratory lineages of each wild type isolate after repeated independent lab cultivation of these strains. To examine the genomic divergence between different laboratory lineages the Bristol and Hawaiian backgrounds, we first generated de novo genome assemblies of two Bristol and two Hawaiian lineages from Illumina and PacBio sequencing reads. Following genome assembly, we quantified Single Nucleotide Polymorphisms (SNPs), short insertion/deletions (indels), and genomic structural variants (SVs). Between laboratory lineages of the Bristol isolate, we identified 25,432 SNPs, 5,202 indels, and 441 SVs. When aligning laboratory lineages of the Hawaiian isolate, we identified 4,518 SNPs, 1,188 indels, and 387 SVs. For both sets of comparisons, we find that SNPs and indels are broadly enriched in introns and depleted from coding sequences. In contrast to SNPs and indels, we find that genomic SVs are enriched in intergenic sequences. Taken together, our analyses reveal the accumulation of genomic divergence between lineages of Bristol and Hawaiian C. elegans from independent lab cultivation, and how these variants may underpin emergent phenotypic differences observed in the two most popularly used C. elegans wild type isolates. Author SummaryLaboratory model organisms, like natural populations, are subject to evolutionary pressures and genomic changes during prolonged laboratory cultivation. In this study we comprehensively quantify SNPs, indels, and SVs between independent lab cultivations of the C. elegans lineages of the Bristol and Hawaiian isolates.
Percival-Smith, A.; Brabrook, C.
Show abstract
An expectation of a hypothesis that proposes cell-to-cell signalling pathways are redundant due to the redundancy of pathway terminal transcription factors (TFs) was tested by screening 35 signalling ligands (SLs) for rescue of a decapentaplegic (dpp) hypomorphic wing growth phenotype. The screen identified three examples of partial rescue: Hedgehog (HH), Semphorin 1a (SEMA1A) and Wnt ortholog 2 (WNT2). HH overexpression with dppGAL4 may increase the expression of DPP activity from the hypomorphic dpp alleles. However, SEMA1A and WNT2 did not phenocopy ectopic expression of HH or DPP and neither SEMA1A nor WNT2 were required for wing growth suggesting substitution of DPP for partial restoration of wing growth. The WNT2 rescue was dependent on the Frizzled 4 (FZ4) WNT receptor excluding the possibility that WNT2 weakly binds the DPP receptor. Although examples of phenotypic nonspecificity of SL function were identified, this is an expectation, and not direct proof, of the hypothesis of TF redundancy. Screen Report SummaryAn expectation of a hypothesis proposing that cell-to-cell signalling pathways are redundant due to the redundancy of the pathway terminal transcription factors was tested by screening for replacement of one signalling ligand (DPP; SLa) with another SLb for wing growth. Three non-DPP SLs were identified in the screen of 35SLs: HH, SEMA1A and WNT2. Genetic analysis of Sema1a and Wnt2 suggests functional complementation of dpp for wing growth suggesting that SEMA1A and WNT2 partially replace DPP for wing growth. Therefore, an expectation of the hypothesis is met.
Dant, A.; Pelosi, J.; Northing, P. C.; Dlugosch, K. M.
Show abstract
PremiseCentaurea melitensis (Asteraceae) is a problematic invader of grasslands globally, but little is known about its genetic makeup. Here we develop a reference genome to facilitate studies of its invasion history, genetic variation, and evolution. MethodsInbred offspring of a single individual of C. melitensis from its invasion of California, USA were used for flow cytometry to estimate genome size, and for genomic DNA extraction. DNA was sequenced with PacBio HiFi technology (yield = 85.7Gb). The genome was assembled with Hifiasm and annotated with BRAKER3. GENESPACE was used to compare gene order (synteny) with three other species within the subfamily Cichorioideae. ResultsWe estimated a mean genome size of 795.0 Mbp for C. melitensis, and our assembly totaled 696.6 Mbp in 48 contigs (N50 = 55.6 Mbp; BUSCO = 98%), with annotation of 25,157 protein-encoding genes. This included four telomere-to-telomere putative chromosomes, nine additional chromosome arms terminated by telomeric repeats, and a complete chloroplast genome. Synteny varied markedly across the genus and subfamily, suggesting a dynamic history of structural variation in the lineage of C. melitensis. DiscussionWe provide a highly complete and contiguous genome assembly to facilitate the further study of genomic variation in C. melitensis.
Kroll, E.; Zoclanclounon, Y. A. B.; Urban, M.; Hill, R.; Hammond-Kosack, K. E.
Show abstract
Fungal genomics has expanded rapidly over the past 30 years, and recently the pace and breath has further quickened for many taxa, although many taxonomic gaps persist. With three decades of rapid growth, fungal genomics now merits a re-examination of its history, progress, and unresolved taxonomic gaps. Here, we review the development of fungal genomics from early efforts such as the Fungal Genome Initiative to current progress driven by third-generation long-read sequencing. We have compiled and summarised publicly available fungal genomes to highlight trends in assembly quality, adoption of long-read technologies, and taxonomic representation. Notably, substantial phylogenetic gaps remain, particularly outside Dikarya, and significant challenges persist for unculturable taxa. This review identifies priorities for the fungal community, including: (1) coordinated efforts to close major taxonomic gaps across the fungal tree of life; (2) improved repository metrics to facilitate identification of high-quality assemblies; and (3) improved and standardised genome annotation which is lacking for most assemblies. Together, these steps will support the development of reliable genomic resources that capture the full breadth of diversity across the fungal kingdom, generating foundational data for comparative genomics, evolutionary biology, functional studies, genetic studies and applied research.
Amarasinghe, A. P.; Pile, L. A.
Show abstract
Cellular metabolism and gene transcription are closely linked. The conserved transcriptional regulator SIN3 acts as a scaffold for histone deacetylase (HDAC)-containing complexes and is crucial for development, stress resistance, and overall organismal health. SIN3 regulates metabolic gene expression in Drosophila cultured cells, however, an understanding of the extent of its role in coordinating responses to metabolic stress in whole organisms is incomplete. In this study, we explored how SIN3 controls glycolytic gene expression across developmental stages and under genetic and dietary disruption of glycolysis in Drosophila melanogaster. Focusing on four key glycolytic enzymes: phosphofructokinase (Pfk), enolase (Eno), pyruvate kinase (Pyk), and pyruvate dehydrogenase beta (Pdhb), we found that reducing Sin3A levels increases their expression in both larvae and adults, indicating that SIN3 plays a consistent role in balancing metabolic gene transcription. Genetic interaction experiments indicate that Sin3A interacts with Pyk and Eno, regulating transcription in a gene-specific manner. Disrupting glycolysis via genetic or dietary means alters glycolytic gene expression, and SIN3 modulates this response. These findings indicate that SIN3 functions as a metabolic sensor, regulating transcription in response to cellular metabolic stress. Additionally, we demonstrate that reducing Sin3A levels shortens Drosophila lifespan on both low- and high-sucrose diets, emphasizing the importance of SIN3 in longevity. Overall, these results show that SIN3 is a context-dependent regulator of glycolytic gene expression and lifespan in Drosophila, integrating metabolic signals with chromatin-based transcriptional regulation. SummaryTo survive and thrive, organisms must adapt to distinct metabolic inputs. We investigated the response of the conserved transcriptional regulator SIN3 to metabolic stress and its control of glycolytic gene expression in Drosophila melanogaster. By measuring glycolytic gene expression, testing genetic interactions, and assessing lifespan under genetic and dietary perturbations, we found that Sin3A knockdown elevates glycolytic gene expression in a gene-specific manner and decreases longevity. SIN3 also modulates transcriptional responses to disrupted glycolysis and influences lifespan under sucrose stress. These findings identify SIN3 as a context-dependent transcription regulator that links gene expression with organismal metabolic adaptation.
Sharma, R.; Wang, M.; Chen, X.; Carver, B. F.; Guttieri, M.; St. Amand, P.; Bernardo, A.; Bai, G.; Liu, S.; Ara, A. M.; Aoun, M.
Show abstract
Stripe rust and leaf rust, caused by Puccinia striiformis f. sp. tritici and P. triticina, respectively, are the most destructive wheat diseases in the southern Great Plains. Green Hammer is a hard red winter wheat (HRWW) cultivar released by Oklahoma State University in 2018 and has demonstrated a stable adult plant resistance to stripe rust and race-specific seedling resistance to leaf rust. To identify and map rust resistance loci, 109 doubled haploid (DH) lines derived from the cross between Green Hammer and another HRWW cultivar, Lonerider, were developed. Lonerider showed adult plant resistance to stripe rust but was susceptible to multiple P. triticina races. The DH lines were evaluated for stripe rust at the adult plant stage in greenhouse and field environments across Oklahoma, Kansas, and Washington, and for leaf rust at the seedling stage against seven U.S. P. triticina races and at the adult plant stage in Oklahoma and Texas. Genotyping-by-sequencing generated 6,078 polymorphic single-nucleotide polymorphisms used for genetic mapping. Quantitative trait loci (QTL) analysis identified 14 stripe rust and 8 leaf rust resistance QTL. For stripe rust, a major QTL in Green Hammer, QYr.osughln-2AS, was identified in the proximity of the 2NvS translocation. Three other major stripe rust resistance QTL were identified in Lonerider on chromosomes 2AL (two QTL) and 2BS (one QTL). For leaf rust, QLr.osughln-1DS and QLr.osughln-2DS.1 were the two major QTL identified in Green Hammer and most likely correspond to the all-stage resistance genes Lr21 and Lr39, respectively. In this study, we identified previously characterized genes as well as unknown genes that can be utilized in wheat breeding programs to enhance resistance to leaf rust and stripe rust.
Ahlinder, J.; Waldmann, P.
Show abstract
Current optimum contribution selection (OCS) implementations use point estimates of estimated breeding values (EBVs), potentially leading to suboptimal selections when individuals have uncertain genetic evaluations. We developed a framework assessing how EBV uncertainty affects OCS decisions through MCMC-based approaches using the COSMO optimizer in Julia, evaluated on Norway spruce (Picea abies, n=5,525) and Loblolly pine (Pinus taeda, n=926) populations. Agreement between point estimate (MAP-OCS) and MCMC-OCS was surprisingly low: mean overlap of only 26.6 (4.8) individuals in Norway spruce genotyped subpopulation and 14.1 (3.6) in full pedigree, with Loblolly pine intermediate at 16.0 (9.6). Despite this low individual-level agreement, selection frequency across MCMC iterations corresponded well with EBV rankings (Spearman{rho} = 0.782 for Norway spruce), confirming that higher-EBV individuals were preferentially selected under posterior uncertainty. To comprehensively quantify uncertainty impacts, we employed two complementary metrics: individual robustness scores measuring genetic gain stability upon candidate removal, and population-level contribution distribution metrics capturing concentration of genetic gain across selected individuals. Applying these metrics identified 25 high-risk individuals in Norway spruce and nine in Loblolly pine, and constrained exclusion of these individuals improved individual robustness by 16.5% in Loblolly pine (3.00% genetic gain loss) and 29.8% in Norway spruce (2.14% genetic gain loss). Our uncertainty-aware OCS framework successfully identifies unstable selections that may compromise long-term genetic gain, and we recommend assessing EBV uncertainty through posterior distributions and evaluating population-specific trade-offs when implementing uncertainty-aware selection strategies.
Wall, D.; Friedberg, A.; Lins, J.; Khalifa, R.; Partipilo, S.; Hart, A. C.
Show abstract
Dominant missense mutations in ATP1A3, encoding a Na+, K+ ATPase -3 subunit, can cause Alternating Hemiplegia of Childhood (AHC), but how these mutations lead to AHC remains unclear. Here, we establish the first C. elegans AHC models by introducing AHC-causing ATP1A3 patient mutations (D801N, E815K, L839P, and G947R) into the orthologous gene, eat-6, using CRISPR/Cas9. Homozygous C. elegans AHC model animals have recessive developmental defects. Heterozygous AHC model animals have dominant defects in neuromuscular junction (NMJ) function that are inconsistent with haploinsufficiency and dominant sleep or arousal defects. Previous work in a Drosophila G755S AHC model found that loss of a K-dependent, Na/Ca{superscript 2} exchanger exacerbated neuronal defects. We introduced a loss-of-function allele of the orthologous C. elegans gene, ncx-4, into C. elegans AHC models; loss of ncx-4 function did not consistently alter C. elegans AHC model defects across alleles. Our results establish novel C. elegans models of AHC with robust phenotypes, demonstrate that AHC mutations disrupt NMJ function, and provide proof-of-concept for discovering cross-species modifiers of AHC-related phenotypes. Summary StatementWe report the first C. elegans models of Alternating Hemiplegia of Childhood. D801N, E815K, L839P, and G947R AHC model animals have recessive development defects and dominant neuromuscular defects.
Acharya, S. R.; Garcia-Abadillo, J.; Lyerly, J.; Brown-Guedira, G.; Jarquin, D.; Bandillo, N.
Show abstract
Genomic prediction models that account genotype-by-environment (GxE) have the potential to accelerate the rate of genetic gain for yield and agronomic performance, yet relatively few studies have applied GxE prediction in public soft red winter wheat (Triticum aestivum) breeding programs. In this study, we extended a reaction norm-based genomic prediction framework by integrating weather-based environmental covariates to more effectively capture genotype- environment interactions. Key agronomic traits, including seed yield, plant height, test weight, and heading date, were evaluated across 33 environments (location-year) using over 3,200 breeding lines from the North Carolina State University small grains breeding program. Multiple genomic prediction models were compared using several cross-validation (CV) schemes representing common breeding scenarios. Across traits, the reaction norm M5 model, which incorporates both GxE and genotype-by-environmental covariate interactions (GxO), achieved the highest prediction accuracy (PA) in CV2 (predicting incomplete field trials) and CV1 for yield and test weight (predicting new lines). The highest PA was observed for test weight under CV2 (0.54) and for yield under CV1 (0.41). Under CV0 (predicting new environments), the M3 model incorporating GxE produced highest PA across traits, with the greatest accuracy for plant height (0.45), although differences among M2, M3, and M4 were small. Prediction under CV00 (predicting new lines in new environments) remained more challenging, with PA values 0.10 - 0.20 across traits. Overall, our results demonstrate that integrating environmental covariates into genomic prediction models can improve predictive performance across diverse wheat-growing environments in North Carolina, supporting their utility for applied breeding efforts. CORE IDEASO_LIIntegrating genotype-by-environment (GxE) interactions with environmental covariates improves prediction accuracy across environments. C_LIO_LIModel performance varies by prediction scenario, with different approaches performing best for new lines, incomplete trials, or new environments. C_LIO_LIPrediction of new lines in new environments remains challenging. C_LI PLAIN LANGUAGE SUMMARYThis study explores how adding environmental information to genomic prediction models can improve prediction accuracy in a public winter wheat breeding program. Using data from multi-environment trials conducted across diverse conditions in North Carolina, we evaluated statistical models that capture how different wheat lines respond to changing environments. By incorporating weather data, we improved the ability to predict performance across locations and years. These findings provide practical insights for refining selection strategies and accelerating genetic gain in wheat breeding.
Lawson, M. E.; Sanow, K. A.; Chetana, K.; Taylor, E.; Morgan, A.; Flannery, D.; Elsie, C.; Rele, C. P.; Reed, L. K.; O'Rourke, K. S.
Show abstract
Gene model for the ortholog of Lst8 (Lst8) in the May 2011 (WUGSC dyak_caf1/DyakCAF1) Genome Assembly (GenBank Accession: GCA_000005975.1) of Drosophila yakuba. This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.
Hill, J. L.; Ellis, J. P.; Williams, R. T.; Apodaca, A.; Basu, A.; Moore, A.; Osborne Nishimura, E.
Show abstract
At a mere 20 cells, the Caenorhabditis elegans intestine regulates metabolism, energy homeostasis, host defense, yolk production, and genetic aging, all while dynamically responding to its environment. How the intestine develops to carry out these disparate functions is unknown, and how cells differ along the length of the intestine is unclear. To address these questions, we performed single-cell RNA sequencing (scRNA-seq) on FACS-enriched intestinal cells from mixed-stage C. elegans embryos. The resulting single-cell transcriptomes of 974 cells organized into 13 clusters, suggesting a diversity of cell types and states. We used two post hoc approaches to ascribe identities to each cluster. First, genes with known developmental timing in early-, mid-, and late-stages were used to place clusters in time, and smiFISH microscopy was used to fine-tune the assignments. Second, the eight late-stage clusters were assessed for their region of origin. To assign these clusters to anatomical regions, we identified marker genes for each cluster and assessed their expression along the anterior-to-posterior length of the intestine using smiFISH microscopy. Genes associated with growth and cell division were expressed in early stages, whereas genes associated with immune responses and metabolism were expressed later. Genes associated with biotic responses and RNA metabolism were the most likely to vary across the intestines anterior-posterior axis. Finally, perturbation of anterior-localized intestinal transcripts more robustly affected intestinal function compared to central or posterior-localized genes. Overall, this research illustrates the intrinsic heterogeneity across the 20 cells of the embryonic intestine and sets the stage for future works aimed at understanding cell-specific intestinal responses to diet and the environment. ARTICLE SUMMARYWe investigate how the Caenorhabditis elegans intestine develops specialized functions on a spatiotemporal scale. We used single-cell RNA-sequencing to analyze embryonic intestinal cells and identify 13 distinct clusters. Combining gene expression analysis with microscopy, we assigned clusters to developmental stages and anatomical regions. Clusters associated with early intestine development express genes linked to growth and cell division, while later-stage clusters express genes involved in metabolism and immune responses. Genes varied across the intestines anterior-to-posterior axis, and disrupting anterior-specific genes produced stronger functional effects. These findings reveal previously unrecognized intestinal diversity and provide insight into how intestinal cells specialize during development.
Brewer, B. J.; Martin, R.; Ramage, E.; Payen, C.; Di Rienzi, S. C.; Zhao, Y.; Zane, K.; Verhey, J.; Galey, M.; Miller, D. E.; Ong, G. T.; McKee, J. L.; Alvino, G. M.; Dunham, M. J.; Raghuraman, M. K.
Show abstract
Gene amplification is a potent driver of evolution and is thought to contribute to genetic diseases, including cancer. The yeast Saccharomyces cerevisiae is a powerful organism for understanding amplification mechanisms. When yeast is grown long term in sulfate-limiting chemostats, amplification of the gene that encodes the primary sulfate transporter, SUL1, is a common outcome. Here we describe a form of SUL1 amplification in which multiple copies of the right terminal region of chromosome II are appended in tandem to a native telomere. We find this form of amplicon when we delete the origin of replication next to SUL1 or delete a variety of genes involved in DNA metabolism. It is the only form of amplification found in a yku70{Delta} mutant suggesting that unprotected telomeres are involved. We propose that these terminal addition events occur when the unprotected 3 G1-3T telomeric sequence invades a short ([~]7 bp) internal telomere sequence (ITS) to begin a form of microhomology-mediated break-induced replication (mmBIR) that has been documented in type-I survivors of telomerase mutants. In addition to amplification of the right end of chromosome II we also find that telomeres containing the sub-telomeric repeat Y experience similar tandem amplification events and show that their formation is reduced in a pol32{Delta} mutant, a gene required for mmBIR. Within individual amplicons the ITSs and Ys are nearly identical, suggesting that the multiple copies of the amplified region are generated in a single mmBIR event that we describe as pseudo-rolling circle mmBIR. A similar amplification event at the P-telomere of human chromosome 18 has four copies of a [~]54 kb region separated by ITSs of nearly identical size. This finding suggests that these additional copies of the terminal fragment of human chromosome 18 arose by the same pseudo-rolling circle mechanism, perhaps during a period of telomeric stress. AUTHOR SUMMARYThe human genome is peppered with duplicates (or higher numbers) of segments that are located at sites both nearby and distant from the original, ancestral segments. These Copy Number Variants, or CNVs, appear to be highly variable among different individuals and are being examined with great interest as potential loci associated with genetic disease. Experimentally determining how these CNVs arise and become distributed across the genome is nearly impossible using humans. We are using budding yeast as the model organism to explore mechanisms of gene amplification. In this work we show that by destabilizing the ends of yeast chromosomes (telomeres) or by interfering with genes involved in the replication, repair, or recombination of DNA results in a specific form of segmental copy number increase that is initiated at telomeres. We propose that a telomere invades an internal chromosome site and sets up a pseudo-circular template for conservative DNA replication. The outcome is a chromosome with multiple, identical copies of a chromosome end arranged in tandem. We believe that it is also a major mechanism used by cells to repair telomeres that have become eroded during aging.
Ivanov, V.; Uludag, K. O.; Schöneberg, Y.; Schneider, J. M.; Kennedy, S.; Hamadou, A. B.; Vink, C. J.; Krehenwinkel, H.
Show abstract
Widow spiders of the genus Latrodectus are important animals for biomedical, pest and conservation research. Here, we present the assembled genomes of two closely related Latrodectus species: the Australian L. hasselti and the New Zealand endemic L. katipo. The genome of L. katipo consists of 13 scaffolds likely corresponding to chromosomes (90% of the total length) and 1267 short scaffolds (10%). It has a total length of 1.5 Gbp and BUSCO of 94.9%. The genome of L. hasselti consists of 379 scaffolds and has a total length of 1.7 Gbp and a BUSCO score of 95.4%. The repeat content is very similar in both genomes with a total proportion of 37.2% for L. katipo and 39.9% for L. hasselti. Genome annotation predicted 12706 and 15111 genes for L. katipo and L. hasselti respectively. An ortholog analysis shows large overlap between orthogroups suggesting either duplication events in L. hasselti or loss of genes in L. katipo.
Rodriguez-Rojas, P. C.; Oceguera-Figueroa, A. F.; Navarro-Siguenza, A. G.; Vazquez Miranda, H.
Show abstract
Text AbstractIn this study, we characterized the genetic structure and reconstructed the demographic history of cactus wrens (Campylorhynchus brunneicapillus), an endemic species of desert regions of North America, that shows a clear phenotypic and genotypic variation. We evaluated the effects of historical climate change on the structure and population dynamics of desert species using genomic data through genotyping by sequencing (GBS) and applied a population structure analysis (FST and ADMIXTURE), revealing two genetically differentiated groups: one continental and another peninsular in Baja California. Subsequently, we implemented the MSMC2 coalescent model on data divided into autosomal regions and the Z sex chromosome to estimate changes in effective population size (Ne) through evolutionary time. Additionally, we developed ecological niche models (ENMs) projected to the Last Glacial Maximum (LGM), Last Interglacial (LIG), Present times, and Future (2060 - 2080). Results indicate that both populations maintained moderated Nes before the LGM, experienced severe bottlenecks (Ne [~] 102-103), followed by a sustained expansion. However, recovery was limited to the Z chromosome of the peninsular population. These findings reveal how glaciations and interglacials shaped the evolutionary history of desert species and provide genomic evidence of the splitting of C. affinis from C. brunneicapillus. Article summaryThis research examines how climate changes shaped genetic diversity of cactus wrens across North American warm deserts. Using coalescent methods, researchers tracked effective population size changes over 100,000 years, using ecological niche modeling they predicted habitat suitability across climate periods. Results showed that continental and peninsular populations experienced bottlenecks during the Last Glacial Maximum, followed by demographic recovery on warm periods. However, the sex chromosome (Z) revealed male-biased demographic patterns in peninsular populations. Future projections indicated habitat suitability reductions for peninsular populations, highlighting conservation concerns. These findings demonstrate that past climate shaped genetic diversity of cactus wrens.
Cakir, U.; Gabed, N.; Kaya, S.; Benedito, V. A.; Brunet, M. A.; Roucou, X.; Kryvoruchko, I. S.
Show abstract
Non-canonical open reading frames (ncORFs) are an emerging area of research that is quickly gaining momentum. Many peptides and proteins missed in initial annotation efforts (ncProts) were subsequently shown to be crucial for a wide range of biological processes. The discovery of ncORFs continues to improve the accuracy of loss-of-function studies because they often occupy the same genomic spaces as annotated ORFs. While databases of mutant phenotypes linked to genomic loci are available in a few species, none of these databases integrate the information on ncORFs present in already characterized loci. In this study, we introduce a nearly comprehensive loss-of-function phenomics dataset of Medicago truncatula (673 loci characterized over the past 30 years), which should become an integral part of the genome browser of this organism. We used this dataset to provide a critical analysis of the potential contribution of ncORFs to published phenotypes. We detected mass spectrometry (MS)-validated ncORFs in 10 functionally characterized genes, including major regulators of development and symbiotic relationships. We also found conserved ncORFs in 113 characterized genes, including four genes with highly conserved ncORFs. We show that in some studies, the contribution of ncORFs can be ruled out, while in others it cannot. Using real examples, we systematized ambiguities associated with ncORFs. Furthermore, we highlighted little-known trans effects of insertional mutagenesis on splicing as contributors to that ambiguity. Finally, our meta-analysis of published phenotypes indicates that different protein classes have significantly different (unique) proportions of unconditional, conditional, and neutral phenotypes, potentially reflecting their relative functional importance. Significance statementThis study is the first to merge a nearly comprehensive inventory of loss-of-function studies in a eukaryotic organism with the information on novel MS-validated and conserved ncORFs.
Waples, R. S.
Show abstract
Interest in quantifying linkage disequilibrium (LD, non-random associations of alleles at different loci) has skyrocketed in recent years as researchers have focused on use of LD in genome-wide association studies (GWAS), for studying historical demography, and for estimating effective population size (Ne). The most widely used LD metric is r2 = the squared correlation of alleles at a pair of loci. Despite a half century of efforts, developing an unbiased expectation of r2 as a function of the many factors that can affect it (physical linkage, genetic drift, selection, migration, mutation, mating systems) remains elusive. Furthermore, even when all of these other factors are absent, empirical estimates of r2 are upwardly biased by sampling a finite number (S) of individuals, and that must be accounted for if one wants to focus on the desired signal of LD. Previous approaches to estimate [Formula] have been shown to be biased to greater or lesser degrees. The purpose of this short paper is to demonstrate that a simple and apparently exact expression for [Formula] does exist for the special case where sampling error is the only factor contributing to r2, in which case [Formula] = 1/(S - 1). When other factors contribute heavily to LD, [Formula] shrinks toward 0 as empirical r2 [->] 1. However, for estimating contemporary Ne with unlinked markers, empirical r2 will generally be small and 1/(S - 1) will provide a robust estimate of [Formula].
Lee, K. G. L.; Bartleet-Cross, C.; Gonzalez-Mollinedo, S.; Dong, S.; Pinto, A.; Lee, C. Z.; Sparks, A.; van de Velde, M.; Manarelli, M.-E.; Holden, T.; Tucker, R.; Maher, K. H.; Hipperson, H.; Slate, J.; Komdeur, J.; Richardson, D.; Dugdale, H.; Burke, T.
Show abstract
Understanding evolutionary processes is greatly facilitated by high-quality data on genetic variation. We report the development of a genomic toolkit for a recently bottlenecked, long-term studied species, the Seychelles warbler (Ptimerl dezil; Acrocephalus sechellensis). This toolkit comprises a reference genome assembled into 31 chromosomes, together with functional annotations and reference-panel-free imputation of whole-genome sequences from 1,935 individuals. The genomic data have been used to assign the sequenced individuals into a genetic pedigree. Individual genomic data are associated with a suite of phenotypic metadata, amassed from three decades of fieldwork in this closed, long-term monitored population. We compared sex and parentage assigned using the genomic data with the previously recorded sex and parentage metadata to identify and correct 41 sample DNA samples labelled with the wrong identity. This population resource enables a wide range of analyses, that include, but are not limited to phylogenetics, metabarcoding, recombination rates, linkage patterns, adaptation, heritability, demographic history, selection, and inbreeding estimates. We wish to encourage interest from researchers seeking to collaborate on future analyses and data collection. Overall, our methods demonstrate the potential of next generation sequencing and statistical tools to provide dense genomic datasets at large sample sizes for wild populations.
Lin, Y.-C.; Urbany, C.; Shlykova, A.; Hoelker, A.; Ouzunova, M.; Prester, T.; Pook, T.; Mayer, M.; Urzinger, S.; Schoen, C. C.
Show abstract
Securing sustainable crop production requires the genetic improvement of abiotic stress tolerance. Due to the broad range of environmental factors causing abiotic stress and complex genotype-by-environment interactions, it is crucial to understand the genetic basis of crop yield under suboptimal conditions. Here, we developed a dent maize Multi-parent Advanced Generation Inter-Cross (MAGIC) population comprising 388 doubled haploid (DH) lines. The population was derived from eight founders with varying stress tolerance, selected from a dent diversity panel evaluated for yield performance across a wide range of European environments. The MAGIC DH lines were genotyped via whole-genome sequencing ([~]5X coverage) and evaluated in seven testcross and 14 line per se trials, for grain dry matter yield, leaf senescence, leaf rolling, anthesis-silking interval, and six additional agronomic traits. Genetic dissection identified 22 grain yield QTL, explaining 45% of the genetic variance. Under heat and drought stress, testcross grain yield correlated significantly with leaf senescence and leaf rolling measured in line per se trials. Bivariate multi-trait analysis showed that alleles for delayed senescence and reduced rolling at detected QTL generally exhibited positive effects on grain yield, suggesting that accumulating these favorable alleles could enhance yield performance. Incorporating these proxies into multi-trait genomic prediction models improved yield prediction accuracy, although gains were constrained by modest trait correlations. Given the comprehensive data, we also provide recommendations for optimizing sequencing depth and QTL mapping strategies in experimental maize populations. Key messageThis eight-founder MAGIC population represents a powerful resource for dissecting complex traits in maize, assessing the utility of drought proxy traits, and optimizing low-coverage whole-genome sequencing approaches.
O'Connor, L. M.; Moya, N. D.; Jhaveri, N. S.; Tanny, R. E.; Khorshidian, A.; Lyu, H.; Chamberlin, H. M.; Baird, S. E.; Andersen, E. C.
Show abstract
The nematode Caenorhabditis elegans was the first metazoan to have its genome completely sequenced and assembled. Since that time, researchers have continuously updated the reference genome and manually curated its approximately 20,000 genes. The closely related species, Caenorhabditis briggsae, has served as a comparative model because of its similar morphology, mode of reproduction, and patterns of intra-species genetic variation. However, the genomic resources for C. briggsae lag behind C. elegans, hindering comparative genomics studies between the species. Decades of experimentation have been performed in the AF16 reference strain genetic background, so we obtained high-coverage long-read sequencing and high-throughput chromosome conformation capture data to create an updated reference genome for an isogenic derivative of AF16, named CGC2. The CGC2 genome is vastly improved relative to the existing AF16 assemblies, with no unplaced sequence, no gaps, and telomere-to-telomere contiguity. To provide genomic resources for CGC2, we exploited deep RNA-seq libraries from all developmental stages to predict protein-coding gene annotations comparable in accuracy and completeness to the existing AF16 gene models. We also performed lift-over of 108 validated insertion-deletion variants to the updated coordinate system of the CGC2 genome to facilitate future mappings of mutations. In summary, we present an updated reference genome for the canonical AF16 reference strain with improved genomic resources to enable high-quality intra- and inter-species comparative studies.
Ivan, J.; Lanfear, R.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWMany phylogenomic studies used non-overlapping windows to address gene tree discordance across a set of aligned genomes. Recently, Ivan et al. (2025) proposed an information theoretic approach to choose an optimal window size given the alignment. However, this approach selects only a single fixed window size per chromosome, which is a useful first step but fails to account for variation in the size of non-recombining regions along each chromosome. Such variation is expected to occur due to the stochastic nature of recombination as well as the variation in recombination rates along chromosomes. In this study, we extend the approach of Ivan et al. (2025) to allow window sizes to vary across the chromosome, using a splitting-and-merging strategy that allows for each window to be of an arbitrary length. We showed that the new method outperformed the fixed-window approach in recovering gene tree topologies on a wide range of simulated datasets. Applying the new method on the genomes of seven Heliconius butterflies, we found that the average window sizes for the group ranged between 538-808bp, but with a very similar distribution of gene tree topologies compared to previous studies that used fixed window sizes. For the genomes of great apes, the average window sizes ranged from 4.2kb to 6.2kb, with the proportion of the major topology (i.e., grouping human and chimpanzee together) reaching approximately 80%. In conclusion, our study highlights the limitations of using a fixed window size when recombination rates vary across the chromosomes, and proposes a splitting-and-merging approach that allows for variable window sizes across whole genome alignments.